Genetic architecture of epigenetic cortical clock age in brain tissue from older individuals: alterations in CD46 and other loci

ABSTRACT The cortical epigenetic clock was developed in brain tissue as a biomarker of brain aging. As one way to identify mechanisms underlying aging, we conducted a GWAS of cortical age. We leveraged postmortem cortex tissue and genotyping array data from 694 participants of the Rush Memory and Aging Project and Religious Orders Study (ROSMAP; 11000,000 SNPs), and meta-analysed ROSMAP with 522 participants of Brains for Dementia Research (5,000,000 overlapping SNPs). We confirmed results using eQTL (cortical bulk and single nucleus gene expression), cortical protein levels (ROSMAP), and phenome-wide association studies (clinical/neuropathologic phenotypes, ROSMAP). In the meta-analysis, the strongest association was rs4244620 (p = 1.29 × 10−7), which also exhibited FDR-significant cis-eQTL effects for CD46 in bulk and single nucleus (microglia, astrocyte, oligodendrocyte, neuron) cortical gene expression. Additionally, rs4244620 was nominally associated with lower cognition, faster slopes of cognitive decline, and greater Parkinsonian signs (n ~ 1700 ROSMAP with SNP/phenotypic data; all p ≤ 0.04). In ROSMAP alone, the top SNP was rs4721030 (p = 8.64 × 10−8) annotated to TMEM106B and THSD7A. Further, in ROSMAP (n = 849), TMEM106B and THSD7A protein levels in cortex were related to many phenotypes, including greater AD pathology and lower cognition (all p ≤ 0.0007). Overall, we identified converging evidence of CD46 and possibly TMEM106B/THSD7A for potential roles in cortical epigenetic clock age.


Introduction
Biomarkers of aging are critical tools for developing and testing interventions to delay agingassociated declines in health, as well as for assessing mechanisms underlying aging processes.Given the high prevalence and burden of dementia in older populations, biomarkers of brain aging have public health relevance [1].
Epigenetic clocks are among the most successful biomarkers of aging to date; these clocks use DNA methylation (DNAm) states across the genome to yield a 'signature' of aging, trained against chronological age or other proxies for health status [2].Recently, a cortical epigenetic clock was developed, specifically using DNAm states in brain tissue [3].This clock integrates 347 CpG sites into an 'aging score' and performs particularly well in distinguishing neurodegenerative phenotypes compared to clocks designed in blood or other tissues [3].Identifying genes regulating cortical clock age is one pathway to discovering novel mechanisms underlying brain aging (although other factors which may be related to epigenetic age, such as environmental exposures, will be important to examine as well).
In our previous research, we calculated cortical clock age in postmortem prefrontal cortex specimens from deceased participants of the Religious Orders Study and Rush Memory and Aging Project, all of whom had been followed during life with annual clinical exams [4,5].We observed strong associations of cortical clock age with multiple clinical and pathologic dementia-related phenotypes, as well as other aging-related phenotypes.Thus, to begin to understand the genetic architecture of cortical clock -which is one approach to potentially identify mechanisms at the intersection of aging and neurodegeneration -we conducted a genome-wide association study (GWAS) of cortical clock age in three large cohorts of aging, all with cortical brain specimens, available DNA methylation array data, and genome-wide array data.

Study Populations
Since epigenetic clock aging appears to differ in older than in younger persons [3,6], we focused our work here in three cohorts which primarily include older participants: Religious Orders Study (ROS), Rush Memory and Aging Project (MAP), and Brains for Dementia Research (BDR).The Religious Orders Study was initiated in 1994 [7] and includes older priests, nuns, and brothers from across the US, free of known dementia at the time of enrolment.Participants agreed to annual neurological exams, neuropsychological testing, and blood draw.Over 1,500 participants completed a baseline evaluation as of November 2023.The follow-up rate and autopsies exceed 90%.The Rush Memory and Aging Project was established in 1997 [7], with virtually identical design and data collection, and includes older men and women from across the Chicago metropolitan area, without known dementia at enrolment; over 2,300 participants completed a baseline evaluation as of November 2023.The follow-up rate exceeds 90% and the autopsy rate exceeds 80%.Both studies were approved by an Institutional Review Board of Rush University Medical Center.All participants signed an informed consent and Anatomical Gift Act for organ donation.More than 2000 autopsies have been obtained to date.The Brains for Dementia Research brain bank was established in 2008 in the United Kingdom, as a research tissue bank across six dementia research centres, with uniform procedures [8].Participants were recruited from the community and provided informed consent for regular assessment and their consent to donate their brain for research upon death.At the time, these analyses were initiated, over 600 postmortem brains were available for research.The study was approved by the National Research Ethics Service.

Assessment of DNA methylation states and Epigenetic Clock Age
In ROSMAP brain specimens, DNA methylation was measured in tissue from the dorsolateral prefrontal cortex (DLPFC).Briefly, 100 mg frozen sections were thawed on ice, with the grey matter dissected from the white matter, as previously described in detail [9].The Qiagen QIAamp DNA mini protocol was used for DNA isolation.DNA methylation profiles were generated using the Illumina Infinium HumanMethylation450 platform.In more recent work with these specimens, processing methods were updated compared to our previous publications [9].The raw signal intensities were imported into the R statistical environment with functions from the methylumi package and further processed with the wateRmelon [10] package.Initial quality control assessment was performed using functions in the methylumi package to exclude samples with inefficient bisulfite conversion (<90%) as well as outliers.Further preprocessing was conducted using the wateRmelon package by applying a p-filter.Probes having more than 1% of samples with a detection p-value greater than 0.05 and a beadcount lower than 3 in more than 5% of samples were excluded.Finally, the filtered data were normalized with 'dasen.'Non-CpG SNP (single nucleotide polymorphism) probes, probes that had been reported to contain common (MAF > 5%) SNPs in the CG or single base extension position or probes that were non-specific or mismapped, were flagged and disregarded in the evaluation of our results.The resulting dataset for analysis here consisted of 730 samples with 423,841 probes each.At each probe, DNAm level was represented as a beta value, that is, the ratio of the methylated probe intensity to the sum of methylated and unmethylated probe intensities.The values ranged from 0 to 1, where a larger value indicates higher methylation.
In BDR, DNA was isolated from prefrontal cortex samples using the Qiagen AllPrep DNA/RNA 96 kit [3].Genome-wide DNA methylation was profiled using the Illumina EPIC DNAm array, which interrogates >850,000 DNA methylation sites.All work was done in the R statistical environment, and processing used either wateRmelon or bigmelon packages.Briefly, the quality control pipeline included (i) checking signal intensities and excluding poorly performing samples, (ii) calculating a bisulphite conversion statistic for each sample, excluding those with a conversion rate <80%, (iii) using p-filter function to exclude samples with >1% of probes with a detection p-value >0.05 and probes with >1% of samples with detection p-value >0.05, and (iv) removing crosshybridizing and SNP probes.The filtered data were normalized with 'dasen.'This resulted in 800,916 DNA methylation sites in 610 prefrontal cortex specimens.
The cortical clock was calculated using publicly available code (https://github.com/gemmashireby/CorticalCloc).Briefly, the original cortical clock was trained in cortex tissue, specifically including large samples of older participants, to focus on brain aging.The clock was trained only using 383,547 CpG sites that were common across the Illumina 450K and EPIC arrays.The cortical clock was trained against chronological age, and integrates 347 CpG sites into a brain age score.

Assessment of genome-wide array Data.
For the ROSMAP participants, genotyping was performed on either the Affymetrix GeneChip 6.0 platform (1,878 participants, 909,600 SNPs) or the Illumina OmniQuad Express platform (456 participants, 730,525 SNPs).DNA was extracted from whole blood, lymphocytes, or frozen brain tissue, as previously described [11].To minimize population admixture, only self-declared non-Latino individuals of European ancestry were genotyped.Then, genotyping data from both platforms were processed using PLINK software, version 1.08p [12], with standard quality control (QC) metrics such as genotype success rate > 0.95, Hardy-Weinberg equilibrium p >0.001, and mishap test < 1 × 10 − 9, as previously described [11,13].EIGENSTRAT was used with default settings to remove population outliers and to generate a genotype covariance matrix [14], and closely related participants were removed.After these QC steps, 1,709 individuals and 750,173 autosomal markers from the Affymetrix GeneChip 6.0 platform, and 382 individuals and 627,742 autosomal markers from the Illumina OmniQuad Express platform were used for imputation.Dosages for SNPs (>35 million) were imputed on the haplotype reference consortium (HRC) panel.Analyses filtered SNPs based on minor allele frequency (MAF) > 0.01 and imputation INFO score > 0.3.This yielded 11,507,242 SNPs for analysis.
For BDR participants, DNA was extracted from brain tissue, and the NeuroChip was used, a custom Illumina genotyping array with an extensive genome-wide backbone (n = 306,670 variants) and a custom content covering 179,467 variants specific to neurological diseases [15].The quality control of the NeuroChip was completed in GenomeStudio (version 2.0, Illumina) and PLINK v1.9 [12].Manual curation of SNP clustering performed by Genome Studio algorithms was conducted on all SNPs.SNPs with ambiguous clustering of the three genotypes were removed.Likewise, individual sample signals which lie ambiguously between genotype clusters were also removed.Average GenTrain score, cluster separation and SNP call frequency were 0.83, 0.85 and 0.996, respectively, in the exported PLINK compatible files.Genotype data was aligned to the GRCh37/hg19 reference genome.SNPs with a minor allele frequency of less than 1%, had genotype calls of less than 95% and had control samples that significantly deviated from Hardy-Weinberg Equilibrium (p < 0.0001) were removed.Imputation was done using the 1000 Genomes Project reference panel and yielded 6,607,832 SNPs for analysis here.
Across the ROSMAP and BDR datasets, there were 5,091,857 SNPs in common.

Genome-wide association analyses
We performed GWAS on cortical clock age in DLPFC in ROSMAP, using linear regression models with cortical clock age as the dependent variable and genotype as the independent variable; covariates included age at death, sex, GWAS chip, study, and the top three principal components derived from the genetic covariance matrix.Association testing was done using an additive model for dose.Due to our modest sample size, for statistical significance, we used a suggestive threshold of p < 10 −5 , as has been done previously [16].We also removed SNPs if the beta estimate was greater than 30, as has been done in previous GWAS of epigenetic clocks [17], since such extreme values are not highly plausible.
We conducted a similar GWAS in the BDR data and meta-analysed results from ROSMAP and BDR, limited to the common SNPs across these cohorts; then, we used METAL to conduct fixedeffect meta-analysis combining results across the cohorts [18].The larger sample size in this metaanalysis has important advantages in terms of the stability of findings.For all GWAS, we calculated lambda values and created Q-Q plots of the -log10 p-values.
In sensitivity analyses, since cell proportions in bulk tissue can be related to aging and DNAm, we also considered potential confounding by neuron proportion.We estimated the neuron proportion using the CETS algorithm [19], which utilizes DNAm states from the Illumina array.We added neuron proportion as a covariate to our models of SNPs and cortical clock age.In additional sensitivity analyses, we controlled for seven cell-type proportions (i.e., inhibitory neurons, excitatory neurons, astrocytes, endothelial cells, oligodendrocytes, oligodendrocyte precursor cells, microglia), using single-cell methylation sequencing as the reference with the Houseman deconvolution algorithm [20].These were sensitivity analyses because we have consistently found in our previous DNAm research in brain tissue that control for cell-type proportion does not meaningfully change findings [4,5,9].

Clumping analysis
To assess the independent loci associated with clock age, we used PLINK to clump SNPs within the suggestive loci (250 kb), setting the linkage disequilibrium (LD) r 2 >0.5 as a threshold [12].The SNP with the most significant relation to clock age within each independent locus was selected as the leading SNP for that locus.

Functional annotation
We used FUMA (Functional Mapping and Annotation of Genome-wide Association Studies) [21] to examine the genetic foundations of cortical clock age.Specifically, we utilized the FUMA SNP2GENE function to annotate SNPs and map them to their respective genes, focusing on their biological functions.Our approach included positional mapping with a maximum distance of 10 kb from each SNP.We set the P-value threshold for lead SNPs at 10 −5 .Apart from these specific settings, we adhered to the standard configurations in SNP2GENE, such as an r 2 threshold of 0.6 for identifying independent significant SNPs.

Expression quantitative trait loci (eQTL) and colocalization analyses
We queried eQTL data from BrainMeta [22].This included eQTL mapping using RNA-seq data comprising 2,865 brain cortex samples obtained from 2,443 unrelated individuals of European ancestry with genome-wide SNP data.These RNAseq data were sourced from seven cohorts, including ROSMAP.
In addition to considering eQTL in BrainMeta, we also conducted colocalization analyses to test if lead SNPs from either the ROSMAP GWAS or the ROSMAP/BDR meta-analysis could be linked to epigenetic age by dysregulation of gene expression.Specifically, we considered our GWAS hits in the context of the bulk RNA-Seq from cortex along with publicly available eQTL and splicing QTL (sQTL) from the Myeloid Cells in Neurodegenerative Diseases (MyND) [23] and Microglia Genomic Atlas (MiGA) [24] datasets of gene expression.For colocalization analyses, we used the COLOC package from Giambartolomei et al. [25], with default parameters.Our criteria for considering a signal as colocalized was posterior probability of 0.8 or higher.
In further analyses, we also used single nucleus RNA-Seq profiling available in 424 of the ROSMAP DLPFC specimens as a way to better understand possible biologic pathways [24]; this included eQTLs from seven cell types (astrocytes, endothelial cells, excitatory neurons, inhibitory neurons, microglia, oligodendrocytes, and oligodendrocyte precursor cells [OPC]).Briefly, grey matter was processed in batches, and 5000 nuclei from each batch were pooled, prepared, and sequenced using either Illumina HiSeqX at the Broad Institute's Genomics Platform or Illumina NovaSeq 6000 at the New York Genome Center, with a target coverage of 1 million reads per channel [26,27].The data were first processed using the CellRanger software (v6.0.0; 10× Genomics) and quality controlled.Each nucleus was assigned back to its participants based on its genotypes.Cell types of nuclei were determined, and pseudobulk matrices were created by summing counts per individual.Within each cell type, genes with CPM > 1 in 80% of samples were retained and normalized using tmm.voom.
Finally, we also considered colocalization across the GWAS results for lead SNPs with seven cell types from the ROSMAP snRNA-Seq to identify common causal variants for cortical clock age and cell-type gene expression.

Functional enrichment analyses
We used the platform snpXplorer to conduct functional analyses [28].This included GO enrichment of biological processes and KEGG pathways, as well as an examination of any overlap of the SNPs and genes identified in our GWAS with previous associations reported in the GWAS catalog.We conducted two separate sets of functional analyses, one for the results of the ROSMAP GWAS and one for the meta-analysis of ROSMAP and BDR.

Analyses of aging phenotypes
As an additional approach to understand how SNPs of interest was related to function, we also examined the lead SNPs in relation to aging phenotypes available in ROSMAP.We first considered common postmortem neuropathologies, available in 1723 participants who also had SNP array data.
Due to the broad range of SNP frequencies, including some with fairly low prevalence, we focused on the most common neuropathologies.This included AD pathologies (global Alzheimer's disease pathology burden, amyloid-β load, PHFtau tangle density) and cerebrovascular pathologies (atherosclerosis and arteriolosclerosis, the two most common cerebrovascular pathologies in the ROSMAP cohorts); these neuropathologies are described in detail elsewhere [5,29].In addition to neuropathologies, we also examined lead SNPs in relation to key clinical phenotypes, available in 1885 participants from annual clinical assessments prior to death; this included two cognitive phenotypes and two motor phenotypes, described in detail elsewhere [5].Specifically, we examined the following: (i) global cognitive function at cohort baseline (the average of 17 cognitive tasks, such that higher scores represent better function), and slopes of cognitive decline from baseline to death; (ii) diagnosis of dementia as of death; (iii) motor function at baseline and over time (a combined score across 10 tests, such as grip strength and timed walking, such that higher scores represent better function); and (iv) Parkinsonian signs (the average of four domains of the 26-item modified United Parkinson's Disease Rating Scale: bradykinesia, gait, rigidity, tremor, such that higher scores represent worse function).
Finally, we also had data available on proteins in DLPFC among 849 participants.Thus, as a further way to confirm functionality, we examined relations of relevant proteins to these same aging phenotypes (using similar methods as described above).Specifically, based on the gene annotation described above, we identified corresponding proteins for the leading SNPs, when available.The proteins were measured using a multiplex mass spectrometry-based proteomics approach with tandem mass tag (TMTs) to analyse frozen tissue samples in the DLPFC.Briefly, 100 mg frozen sections were thawed on ice, with the grey matter dissected from the white matter, as previously described in detail [30].Details of the mass spectrometry -based proteomics, database searches, and quality control have been previously described [30].To summarize, the samples were homogenized, and the protein concentration was determined.After protein digestion, isobaric TMT peptide labelling and high pH fractionation were performed.Fractions were then analysed by liquid chromatography-mass spectrometry.The resulting mass spectrometry spectra were searched against the UniProt human protein database, with individual protein abundance checked against the global internal standard.An additional data process included regressing out technical confounders.A total of 8425 proteins passed the final quality control.
In all these analyses of aging phenotypes, for continuous phenotypes measured at death (e.g., global AD pathology), we used linear regression models; for continuous phenotypes measured repeatedly over time (e.g., global cognition), we used generalized linear mixed models.For categorical phenotypes (e.g., atherosclerosis severity, dementia diagnosis as of death), we used logistic regression models.All models controlled for age at death, sex, education, and cohort.
A flow chart summarizing our overall analytic approach is presented in Figure 1.
Approximately one-third of the ROSMAP sample was male, and participants were split fairly evenly across the ROS and MAP cohorts.Mean education was 16 y.Approximately 50% had pathologic AD, while 60% had clinical diagnosis of dementia as of death.The BDR cohort (n = 522) had a mean age at death of 83 (SD 9.0) y, and the mean cortical clock age was 84 y.Approximately half of the cohort was male, and mean education was approximately 13 y.Nearly half of the participants had pathologic AD, and about 60% had clinical dementia.

GWAS of cortical clock age
In our primary analyses, we used fixed-effect meta-analysis to combine the ROSMAP and BDR GWAS results, limited to the 5 million SNPs which were common after imputation to the Affymetrix GeneChip 6.0 and Illumina OmniQuad Express chip (from ROSMAP) as well as the NeuroChip custom array from BDR.We found 110 SNPs which met our suggestive criteria for statistical significance of p < 10 −5 (Figure 2 and eTable 1).We found no evidence of genomic inflation for this meta-analysis (eFigure 1).After excluding SNPs based on the frequency of effect allele, there were 13 independent loci; Table 2 lists the 13 leading SNPs.The strongest association was for rs4244620 (p = 1.29 × 10 −7 ) on chromosome 1, which was annotated with the gene CD46.CD46 gene encodes a type 1 membrane protein, which protects host cells from injury by complement and modulates T-cell activation [31].
In secondary analyses of our GWAS of cortical clock age in ROSMAP specimens, with a larger number of SNPs than available for the metaanalysis, we identified 304 SNPs meeting our predefined criteria for suggestive statistical significance (p < 10 −5 ) (Figure 2 and eTable 2).We found no evidence of genomic inflation (eFigure 1).After excluding SNPs with effect allele frequency less than 5%, we identified 'leading' SNPs from 24 independent loci.Table 3 shows the 24 lead SNPs with the highest statistical significance for each independent loci.The top SNP was rs4721030, which was close to genome-wide significant (p = 8.64 × 10 −8 ).This SNP was annotated with two nearby genes, TMEM106B (transmembrane protein 106B) and THSD7A (thrombospondin type 1 domain containing 7A) on chromosome 7. TMEM106B gene appears to be a key regulator of aging, including influences on microglial proliferation and survival and involvement in myelination and lysosomal pathways [32].TMEM106B has been previously associated with Alzheimer's dementia and other neurodegenerative phenotypes in multiple studies (i.e., TDP-43 proteinopathy, brain inflammation, and amyloid deposition in the brain) [30,[32][33][34][35][36][37]; further, our own previous research in ROSMAP has found links of TMEM106B with LATE-NC and worse cognitive resilience to neuropathology [16,38].
Most importantly, several SNPs were identified both in ROSMAP and in the meta-analysis of ROSMAP/BDR, suggesting consistent findings across the cohorts.In particular, rs4244620, was a leading SNP in both, including the strongest association in the meta-analysis.In addition, both sets of analyses pointed to rs34403329 on chromosome 8 (p = 2.48 × 10 −6 ), which was annotated with two  The SNP with the strongest p-value from the GWAS of ROSMAP, rs4721030, was not in the BDR data and thus was not be tested in the meta-analysis.) In sensitivity analyses, in models controlling for neuron proportion, we did not find any meaningful differences in the relations of leading SNPs to cortical clock age (eTable 3).Results were also largely consistent with analyses controlled for seven different cell-type proportions (eTable 4), which were estimated using single-cell methylation sequencing as the reference.
We also examined eQTLs in seven cell types from the single nucleus RNA-Seq in ROSMAP cortex (Figure 3b and eTable 5).Most interestingly, we found that rs4844620 exhibited significant eQTL effects for CD46 gene expression in six cell types, such that each additional effect allele was associated with lower CD46 expression (Figure 3b).The strongest associations were observed in oligodendrocyte precursor cells (p = 4.95 × 10 −33 ) and in oligodendrocytes (p = 3.32 × 10 −46 ).For the SNPs from the ROSMAP/BDR metaanalysis, there were no further significant single nucleus eQTLs beyond those for rs4844620.

Colocalization across clock age and quantitative trait loci
To better evaluate whether lead SNPs may be associated with both clock age as well as gene We also conducted colocalization analyses across our lead SNPs and the seven cell types in the ROSMAP single-nucleus eQTL (Figure 3c).We found that observed associations with cortical clock age and with CD46 expression in oligodendrocytes appeared to be due to the single colocalizing variant rs4844620 (PP > 0.8).Thus, the CD46 gene with respect to oligodendrocyte biology may have particular relevance.

Functional analysis
When we used snpXplorer to examine biologic processes that may underlie cortical clock age, in the lead SNPs, there were no significant findings after correction for multiple comparisons.Nonetheless, two pathways were nominally significant (eTable 6): drug metabolism (p = 0.009) and pyrimidine nucleoside catabolic processes (p = 0.04).With snpXplorer, we also identified SNPs and genes that overlapped with previous GWAS association studies.We found that rs4844620 was reported in previous GWAS of age-related macular degeneration.In addition, several genes from our GWAS were previously related to type 2 diabetes and to waist circumference, as well as to depression and cognitive function.These results may suggest that metabolism or metabolic dysregulation underlie DNA methylation age in brain.

Aging phenotypes
In our last steps, to help identify clinical relevance, we examined if each leading SNP was associated with aging phenotypes measured in the ROSMAP cohorts, after controlling for age at death, sex, education, and cohort (eTable 7).It is difficult to determine appropriate correction for multiple comparisons, since many of these aging outcomes are correlated with each other; we considered nominally significant findings for any phenotype (Figure 4a) and focus here on the strongest results.Perhaps most interesting, rs4844620 (which was identified in the ROSMAP GWAS, the ROSMAP/ BDR meta-analysis, and exhibited cis-eQTL effects for CD46 in the bulk RNA-Seq and snRNA-Seq, as described above), was associated with lower level of baseline cognition (β=-0.10,p = 0.030), faster slopes of cognitive decline (β=-0.01,p = 0.007), and with greater level of Parkinsonian signs at baseline (β = 0.11, p = 0.04), indicating pleiotropic relations with cognitive and motor phenotypes.Among additional leading SNPs, rs9557340 was related to multiple neuropathologies and to cognition.Specifically, this SNP was associated with higher global AD pathologic burden (β = 0.04, p = 0.0047), greater β-amyloid load (β = 0.007, p = 0.014), greater tau tangle density (β = 0.11, p = 0.0022), higher odds of dementia (odds ratio = 1.17, p = 0.016), and to faster slopes of cognitive decline (β=-0.10,p = 0.014).
Finally, using available proteomics data in over 800 ROSMAP participants, we were able to select 14 proteins measured in DLPFC (ACTR3B, ASAH1, CD46, EPHA7, LARP1, MOXD1, OLA1, NAT1, SLITRK5, POSTN, STAU2, TMEM106B, THSD7A, TMBIM6), which corresponded to annotated genes or eQTLs for lead SNPs in our GWAS.We examined the relation of these 14 proteins with neuropathologies, cognitive, and motor phenotypes, after controlling for age at death, sex, education, and cohort (eTable 8).Again, we considered any nominally significant relation of protein level to phenotypes, and focus here on the strongest findings (Figure 4B).Notably, our GWAS SNP rs4721030 on chromosome 7 was annotated with two genes, TMEM106B and THSD7A; we found broad relations of both proteins to most of the aging phenotypes we examined.In particular, TMEM106B protein levels were strongly related to odds of dementia (p = 0.0004) and lower level of cognitive function at baseline (p = 0.007), as well as to worse motor function (motor function at baseline, p = 0.001; slopes of decline, p < 10 −5 ) and more Parkinsonian signs (baseline, p = 0.01; slopes, p < 10 −5 ).THSD7A protein levels were strongly to greater global AD pathology, to greater amyloid-β load and PHFtau tangle density (all p < 10 −5 ), as well as to lower level of cognitive function at baseline (p = 0.007) and faster slopes of cognitive decline (p = 0.001).

Discussion
We conducted a GWAS of cortical epigenetic clock age calculated in brain tissue, among 1,340 older, deceased participants from three cohorts; we further leveraged transcriptomic, proteomic, and phenotypic data from participants, as well as publicly available data, to identify converging evidence of genes underlying cortical epigenetic clock age.In particular, our results supported several genes, especially CD46, as candidates of interest with potentially broad roles in cortical clock age.
Several previous studies have also conducted GWAS of epigenetic clocks in both peripheral colocalization analyses demonstrate the likelihood that a SNP is causal for both cortical clock age and for expression levels.The value to the right of each point is the posterior probability (PP) that the SNP is related to clock age and to the QTL; PP > 0.8 is standardly considered as high likelihood the SNP is causal for both traits.ROSMAP eQTL are from frontal cortex; MiGA are from primary monocytes; MyND are primary microglia; single cell eQTL are from ROSMAP frontal cortex.
blood and brain tissue [45][46][47][48].Consistent with our findings, the largest GWAS of five clocks, calculated in blood samples (n = 40,000), 45 prioritized CD46 as a possible determinant of epigenetic age from the Horvath clock.Additionally, CLYBL (Citramalyl-CoA Lyase) was prioritized as a predictor of the PhenoAge clock; we found that a loci (rs9557340) in the CLYBL gene was related For continuous outcomes, we show mean differences from linear or linear mixed effects models; for categorical outcomes, we show odds ratios from logistic regression models.All models controlled for age at death, sex, and education.Of particular interest here, rs4844620 was related to lower baseline cognition (p = 0.031), steeper slopes of cognitive decline over time (p = 0.007), higher baseline levels of Parkinsonian signs (p = 0.046).Additionally, rs9557340 was related to more global AD neuropathology (p = 0.0047), more amyloid-β load (p = 0.014), greater PHFtau tangle density (p = 0.0022), lower baseline cognition (p = 0.014) and higher odds of dementia (p = 0.0038).panel b: we were able to examine protein levels in frontal cortex, based on annotated genes in our GWAS (14 genes of interest had corresponding protein levels in our proteomics data).We show here results for proteins which reached nominal significance for phenotypes.Of particular interest here, higher levels of the protein THSD7A in prefrontal cortex were related to more global AD pathology (p < 10 −5 ), more amyloid-β load (p < 10 −5 ), greater PHFtau tangle density (p < 10 −5 ), lower baseline cognition (p = 0.007), steeper slopes of cognitive decline (p = 0.001) and higher odds of dementia (p = 0.046).Higher levels of TMEM106B protein were related to greater PHFtau tangle density (p = 0.03), more parkinsonism at baseline (p = 0.01) and steeper slopes over time (p = 0.0001), lower motor function at baseline (p = 0.001) and steeper slopes (p < 10 −5 ), lower baseline cognition (p = 0.007), steeper slopes of cognitive decline (p = 0.04) and higher odds of dementia (p = 0.0004).Solid colors indicate significant association.
to cortical clock age and was strongly related to AD neuropathologies and to cognition -suggesting clinical relevance of CLYBL gene to brain aging.However, in a GWAS (n = 1100) of the Horvath clock in brain tissue (which included some of our ROSMAP participants) [47], there was no overlap of their primary findings in prefrontal cortex with ours for the cortical clock; these samples spanned a much wider age range than ours, and there may be differences in genetic correlates of aging at different ages.Nonetheless, overall, there is some evidence that specific genes may have systemic effects on biologic aging.
In terms of our own findings, CD46 gene encodes the membrane cofactor protein, an ubiquitously expressed complement regulatory protein.CD46 protects host cells from injury by complement and appears to link innate and adaptive immune responses.The complement system regulates central nervous system function, and complement-mediated neuroinflammation may be involved in a range of aging-related neurodegenerative disorders, from cognitive impairment to age-related macular degeneration [49].Other genes in the complement system are strongly related to dementia, including CR1, C1S, CD33, and TREM2; interestingly, CR1, a well-established genetic risk factor for dementia, is a receptor for complement C3b and C4b proteins, and CD46 mediates inactivation of C3b and C4b, pointing to potential mechanistic pathways [50].Further, in our results, rs4844620 appeared to be a common causal SNP across epigenetic age and eQTLs for CD46 in cortex tissue, as well as in primary monocytes and microglial samples, supporting the role of CD46 in immune response in the brain.In addition, our single nucleus data from DLPFC further identified eQTL effects for CD46 in oligodendrocytes, and colocalization of this SNP with epigenetic age and CD46 expression in oligodendrocytes.This may support recent data that oligodendrocytes could have immunomodulatory properties [51].Overall, there is compelling evidence, from our study and others, to support CD46 as a target for further research.
In additional results from the GWAS of ROSMAP alone, the strongest loci in relation to clock age in our GWAS was rs4721030 on chromosome 7, which was associated with older epigenetic age, and nearly reached genome-wide significance.We note that the beta value we found for this SNP in relation to clock age was 8.7, which is much larger than many of our other findings, indicating more caution is warranted in interpreting this result.For context, the mean beta value in our 304 SNPs of interest in ROSMAP was 2.6 (SD 2.5); that is, the beta for rs4721030 was 2.4 standard deviations greater than the mean beta in our research here, which is large although not an extreme outlier.Further, we could not examine this SNP in the meta-analysis, since it was not present in the BDR data, even after imputation, likely due to the highly specialized array used by BDR.For these reasons, this finding should be carefully considered and interpreted cautiously.Nonetheless, the SNP was annotated with two genes, TMEM106B and THSD7A.Previous work has implicated the TMEM106B gene in neurodegenerative phenotypes, especially TDP-43 pathology [38].THSD7A is a soluble form of membraneassociated N-glycoprotein, produced by cells of endothelial and neuronal origin.In zebrafish, THSD7A may have a role in the neurovasculature [52], and THSD7A has been related to educational attainment [53]; in recent research using GWAS summary statistics, TMEM106B/THSD7A were reported as pleiotropically related to dementia and major depression disorder [54].Interestingly, we found associations of higher THSD7A/ TMEM106B protein levels in ROSMAP DLPFC with a wide array of aging phenotypes, from AD pathologies to lower cognitive function to worse motor function.Thus, our findings in GWAS as well as in cortical protein levels, together with the existing literature, may suggest that TMEM106B and THSD7A have relevance to brain aging.
There are substantial strengths to our work here.We assembled a sample of older individuals for this GWAS of cortical clock age; the uniform older ages of participants may be particularly important for evaluating the genetic architecture of cortical clock age, as there is substantial evidence that epigenetic clocks function differently at older than at younger ages [3,6].We also had available a wide array of genomic and phenotypic data; this enabled us to validate and prioritize genes of interest despite the absence of a replication dataset.Limitations of our study should also be considered.Most importantly, our sample size was small for detecting genome-wide significant findings.However, many SNPs achieved a predetermined, 'suggestive' threshold of 10 −5 , which we [16] and others [55] have previously used.Further, to prioritize candidate loci, we were able to validate results with a range of additional information, from gene expression (both bulk and single-cell expression) to protein levels in brain tissue and evaluation of neuropathologic and clinical aging phenotypes.Thus, we were able to confirm findings via multiple lines of converging evidence.Nonetheless, we likely missed important SNPs which did not meet even our suggestive threshold.Second, the DNAm states as well as the transcriptomic and proteomic data were all derived from cortex.Thus, we may have missed genes that are primarily relevant in other brain regions.However, we focus on DLPFC given its critical role in a wide range of higher order human behaviour; additionally, evidence suggests a good correlation of epigenetic age across brain regions [4,47].Finally, another limitation is our primarily non-Latino white participants, across all three cohorts.Evidence indicates that neurodegenerative diseases of aging, such as dementia, differ across diverse groups; for example, older African Americans appear to have higher risk of dementia, and Latinx individuals appear to develop dementia at younger ages, than non-Hispanic whites.[56,57].Thus, future research is clearly needed in minoritized older participants to better understand brain aging in groups who often have worse health.

Figure 2 .
Figure 2. Manhattan plot: GWAS of cortical clock age for ROSMAP and Meta-analysis of ROSMAP/BDR.* In the top panel, Manhattan plot shows the Cortical Epigenetic Clock GWAS results for ROSMAP.The bottom panel shows the GWAS results for the meta-analysis combining ROSMAP and BDR.Dashed red line represents a suggestive threshold of p = 10 −5 .The solid-line represents the genome-wide significance level of p = 5 × 10 −8 .Labels show the closest genes to GWAS lead SNPs with MAF >5%.

Figure 3 .
Figure 3. Expression quantitative trait Loci** in panel a, Heatmap show the GWAS lead SNPs that are FDR significant eQTL in bulk brain RNAseq (BrainMeta), and sn-RNAseq cell specific eQTL.*P ≤ 0.05, **P ≤ 0.01, ***P ≤ 0.001.The strongest relation of SNP to expression level was identified for rs4844620 with CD46.In panel b, we show CD46 expression levels across genotypes of rs4844620 in 7 cell types from ROSMAP single nucleus RNAseq.This was the strongest finding for the snRnaseq results.In panel c,

Figure 4 .
Figure 4. Relations of SNPs and cortical proteins to aging Phenotypes**, panel a: we show relations of leading SNPs to aging phenotypes, for SNPs with nominally significant relations with phenotypes.For continuous outcomes, we show mean differences from linear or linear mixed effects models; for categorical outcomes, we show odds ratios from logistic regression models.All models controlled for age at death, sex, and education.Of particular interest here, rs4844620 was related to lower baseline cognition (p = 0.031), steeper slopes of cognitive decline over time (p = 0.007), higher baseline levels of Parkinsonian signs (p = 0.046).Additionally, rs9557340 was related to more global AD neuropathology (p = 0.0047), more amyloid-β load (p = 0.014), greater PHFtau tangle density (p = 0.0022), lower baseline cognition (p = 0.014) and higher odds of dementia (p = 0.0038).panel b: we were able to examine protein levels in frontal cortex, based on annotated genes in our GWAS (14 genes of interest had corresponding protein levels in our proteomics data).We show here results for proteins which reached nominal significance for phenotypes.Of particular interest here, higher levels of the protein THSD7A in prefrontal cortex were related to more global AD pathology (p < 10 −5 ), more amyloid-β load (p < 10 −5 ), greater PHFtau tangle density (p < 10 −5 ), lower baseline cognition (p = 0.007), steeper slopes of cognitive decline (p = 0.001) and higher odds of dementia (p = 0.046).Higher levels of TMEM106B protein were related to greater PHFtau tangle density (p = 0.03), more parkinsonism at baseline (p = 0.01) and steeper slopes over time (p = 0.0001), lower motor function at baseline (p = 0.001) and steeper slopes (p < 10 −5 ), lower baseline cognition (p = 0.007), steeper slopes of cognitive decline (p = 0.04) and higher odds of dementia (p = 0.0004).Solid colors indicate significant association.

Table 1 .
Characteristics of study cohorts.
*Educational attainment was reported by 369 participants in the Brains for Dementia Research cohort, and clinical dementia status was known in 443 participants.

Table 2 .
Leading SNPs from GWAS of cortical clock: meta-analysis of ROSMAP and BDR.